Skip to content

Conversation

@Ewan-Keith
Copy link

Description

Encountered a bug when:

  • "use_materialization_v2": False
  • config.contract.enforced: true
  • Multiple foreign keys are specified from one table to another (e.g. a fact table with multiple date values, each of which has a foreign key constrain to a common data dimension)
  • An initial full run of the model with the FK constraints is carried out, followed by an incremental run.

The bug is that on the initial full run the multiple FK constraints to the common table are created correctly on the Databricks table, but then when an incremental run is carried out these FKs are removed from the Databricks table. Single FKs to other tables seem to be retained (this isn't replicated in the test case in the interest of keeping it minimal).

The bug is replicated in the integration test TestIncrementalMultipleFKsToSameTableNoV2, before the fix is applied this passes the first assertion (after the initial full run) but fails the second assertion (after the incremental run) with an empty set of constraints returned. A matching test is also present replicating the logic when using materialization_v2 TestIncrementalMultipleFKsToSameTable which confirms the issue is not seen with v2 materialization (this passes without any actual code changes).

The fix I've landed on in dbt/include/databricks/macros/materializations/incremental/incremental.sql is to treat constraints as one of the configuration changes that is explicitly applied on an incremental run when changes are detected (the same as for tags, tblproperties, and liquid_clustering). This gets the failing test passing.

I don't think this is necessarily a root-cause fix, I think the root issue is likely a change in constraints being falsely detected in this instance, but I wasn't sure where to start on that, and this fix seems sensible (even if the root issue is resolved it seems likely we'd want changes to constraints to be applied without forcing a full-refresh where this is possible?). There's a unit test checking if there are any obvious bugs in the constraint diff checker but this doesn't turn anything up (passes as expected).

Checklist

  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt-databricks next" section.

… table to another aren't retained on an incremental run

Signed-off-by: Ewan Keith <[email protected]>
Signed-off-by: Ewan Keith <[email protected]>
{% if liquid_clustering is not none %}
{% do apply_liquid_clustered_cols(target_relation, liquid_clustering) %}
{% endif %}
{% if constraints %}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What i'm confused by is your description made it sound like constraints were being dropped, but we haven't removed or altered any logic that would cause them to drop. I'm digging in, but if you could provide more context, like what SQL is showing up in the logs, that would be appreciated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By any chance, is the table with the PK not incremental? If the PK table is not incremental, during create or replace it's PK will be destroyed and recreated, which causes the cascading delete of the FKs. If the PK table were incremental, then no PKs or FKs should have been deleted. So the fix here is basically saying, incremental tables, if your constraints don't match, whether due to deletion or due to the user changing it, reapply.

Copy link
Author

@Ewan-Keith Ewan-Keith Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that's it! I was having trouble replicating the case so reduced our exact issue down in the integration test, thought it was the multiple keys that were the issue but total red herring, it's the inc -> non-inc like you say.

I've updated the integration tests to reflect this and replicated the behaviour (fails on v1 before we make the change to incremental.sql, passing after).

@benc-db benc-db requested a review from sd-db as a code owner January 8, 2026 21:48
@benc-db
Copy link
Collaborator

benc-db commented Jan 8, 2026

I'll run the E2E tests when there is less congestion on the resources. Thanks for raising this.

@benc-db
Copy link
Collaborator

benc-db commented Jan 9, 2026

@Ewan-Keith not sure if you can see this, but we have a failure in the E2E tests:

01:01:10  1 of 1 START sql incremental model test17679204186774943539_test_constraints.stg_numbers  [RUN]
01:01:10  [WARNING]: Use revamped materializations based on separating create and insert.  This allows more performant column comments, as well as new column features.
You may opt into the new behavior sooner by setting `flags.use_materialization_v2` to `True` in `dbt_project.yml`.
Visit https://docs.getdbt.com/reference/global-configs/behavior-changes for more information.
01:01:17  1 of 1 ERROR creating sql incremental model test17679204186774943539_test_constraints.stg_numbers  [ERROR in 6.88s]
01:01:17  
01:01:17  Finished running 1 incremental model in 0 hours 0 minutes and 12.18 seconds (12.18s).
01:01:17  
01:01:17  Completed with 1 error, 0 partial successes, and 0 warnings:
01:01:17  
01:01:17  Failure in model stg_numbers (models/stg_numbers.sql)
01:01:17    Database Error in model stg_numbers (models/stg_numbers.sql)
  
  [PARSE_SYNTAX_ERROR] Syntax error at or near '('. SQLSTATE: 42601 (line 2, pos 113)
  
  == SQL ==
  /* {"app": "dbt", "dbt_version": "1.11.2", "dbt_databricks_version": "1.11.3", "databricks_sql_connector_version": "4.1.3", "profile_name": "test", "target_name": "default", "node_id": "model.test.stg_numbers"} */
  ALTER TABLE `peco`.`test17679204186774943539_test_constraints`.`stg_numbers` ADD CONSTRAINT fk_n FOREIGN KEY (n) (n) REFERENCES test17679204186774943539_test_constraints.raw_numbers

In TestIncrementalForeignKeyExpressionConstraint::test_incremental_foreign_key_constraint

You also need to add annotations to your new tests to tell it to skip testing for "databricks_cluster" profile, since Hive doesn't support constraints.

@Ewan-Keith
Copy link
Author

@Ewan-Keith not sure if you can see this, but we have a failure in the E2E tests:

01:01:10  1 of 1 START sql incremental model test17679204186774943539_test_constraints.stg_numbers  [RUN]
01:01:10  [WARNING]: Use revamped materializations based on separating create and insert.  This allows more performant column comments, as well as new column features.
You may opt into the new behavior sooner by setting `flags.use_materialization_v2` to `True` in `dbt_project.yml`.
Visit https://docs.getdbt.com/reference/global-configs/behavior-changes for more information.
01:01:17  1 of 1 ERROR creating sql incremental model test17679204186774943539_test_constraints.stg_numbers  [ERROR in 6.88s]
01:01:17  
01:01:17  Finished running 1 incremental model in 0 hours 0 minutes and 12.18 seconds (12.18s).
01:01:17  
01:01:17  Completed with 1 error, 0 partial successes, and 0 warnings:
01:01:17  
01:01:17  Failure in model stg_numbers (models/stg_numbers.sql)
01:01:17    Database Error in model stg_numbers (models/stg_numbers.sql)
  
  [PARSE_SYNTAX_ERROR] Syntax error at or near '('. SQLSTATE: 42601 (line 2, pos 113)
  
  == SQL ==
  /* {"app": "dbt", "dbt_version": "1.11.2", "dbt_databricks_version": "1.11.3", "databricks_sql_connector_version": "4.1.3", "profile_name": "test", "target_name": "default", "node_id": "model.test.stg_numbers"} */
  ALTER TABLE `peco`.`test17679204186774943539_test_constraints`.`stg_numbers` ADD CONSTRAINT fk_n FOREIGN KEY (n) (n) REFERENCES test17679204186774943539_test_constraints.raw_numbers

In TestIncrementalForeignKeyExpressionConstraint::test_incremental_foreign_key_constraint

You also need to add annotations to your new tests to tell it to skip testing for "databricks_cluster" profile, since Hive doesn't support constraints.

Yeah I can see the failure, will try to find some time to understand why this change is causing the issue. Have added the test annotation, must've just gotten lost at some point in editing.

@Ewan-Keith
Copy link
Author

Think I tracked down the issue, the changes interacted badly with the code to render FK statement suffixes using the expression syntax (haven't used it myself so was a bit of a blindspot).

I think the re-application of the FK constraints on the incremental model actually raised a pre-existing issue where the python rendering of the FK sql differed from the jinja used for the initial creation. The failing functional test defines a FK constraint like:

expression: (n) REFERENCES schema.raw_numbers

The original _render_suffix() always prepended FOREIGN KEY (columns) then appended the expression. This resulted in invalid sql:

FOREIGN KEY (n) (n) REFERENCES schema.raw_numbers

I fixed this in the _render_suffix() method but that caused one of the unit tests to fail. This was because expression can either specify the FK columns or not, the unit test that failed didn't specify columns, so:

expression="REFERENCES other_table (other_id) DEFERRABLE"

became:

 FOREIGN KEY REFERENCES other_table (other_id) DEFERRABLE

instead of:

FOREIGN KEY (id, other) REFERENCES other_table (other_id) DEFERRABLE

Spelling all this out because while I think the new fix should resolve these issues (all unit tests and functional tests under tests/functional/adapter/constraints/ passing locally) it's hardly an elegant solution.. Matching on ( at the start of the string to see if columns are specified in the expression or not and handling appropriately.

@benc-db
Copy link
Collaborator

benc-db commented Jan 12, 2026

Hi @Ewan-Keith we didn't make the cut off for the 1.11.4 release, can you update the changelog change to be part of the 1.11.5 release? We'll plan to put that out next week with your fix. Rerunning tests now.

@Ewan-Keith
Copy link
Author

Ewan-Keith commented Jan 12, 2026

no worries! that's the change log updated.

Hmm, looks like another failure in the run-cluster-e2e-tests suite. I've been running against a sql warehouse locally. Is this stuie the hive metastore cluster?

@Ewan-Keith
Copy link
Author

have updated the key code change to:

        {% if constraints and not target_relation.is_hive_metastore() %}
          {{ apply_constraints(target_relation, constraints) }}
        {% endif %}

Looking around the project does very similar elsewhere already, e.g. over in https://github.com/databricks/dbt-databricks/blob/main/dbt/include/databricks/macros/relations/components/constraints.sql

  {% if relation.is_hive_metastore() %}
    {{ exceptions.raise_compiler_error("Incremental application of constraints is not supported for Hive Metastore") }}
  {%- endif %}

I only have a free dbx account to hand at the moment so unfortunately just have a serverless warehouse to test with so can't validate if this fixes things. but it feels like it should.

it does raise another question which is why the same error isn't hit for liquid clustering/tags, maybe their tests handle this already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants